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ABSTRACT 

Sophisticated man-machine interaction often requires the 
human operator to perform a stereotyped scan of various 
instruments in order to monitor and/or control a system. For 
situations in which this type of stereotyped behavior exists, 
such as certain phases of instrument flight, scan pattern has 
been shown to be altered by the imposition of simultaneous verbal 
tasks. This report describes a study designed to examine the 
relationship between pilot visual scan of instruments and mental 
workload. It was found that a verbal loading task of 
varying difficulty causes pilots to stare at the primary 
instrument as the difficulty increases and to shed looks at 
instruments of less importance. The verbal loading task 
also affected the rank o deling of the scanning sequences. By 
examining the behavior of pilots with widely varying skill 
levels, it was suggested that these effects occur most strongly 
at lower skill levels and are less apparent at high skill 
levels. A graphical interpretation of the hypothetical 
relationship between skill, workload, and performance is 
introduced and modelling results are presented to support this 
interpretat ion . 

In addition a measure of entropy of the scan is introduced 
and, as a measure cf the randomness of the seen, appears to be 
closely related to tie measured verbal task load. In a parallel 
manner periodicity of the scan, as reflected by its 
autocorrelation was found to be of particular interest ir 
assessing pilot response to increasing mental workload. 
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SUMMARY 

The experimental method described herein required pilots to 
maintain a general aviation flight simulator on a straight and 
level, constant sensitivity, Instrument Lending System (1LS) 
course with a low level of turbulence. An additional periodic 
verbal task whose difficulty increased with frequency was used to 
increment the subject's mental workload. The subject's lookpoint 
on the instrument panel during each ten minute run was computed 
via a TV oculometer and stored. Several pilots ranging in skill 
from novices to test pilots took part in the experiment. 

Th < results irdicate an increase in fixation dwell times, 
especially on the primary instrument, with increased rental 
loading task. The amount of "staring" observed appears to depend 
on the level of skill of the pilot; skilled subjects appear to 
stare less under increased loading than do more novice pilots. 

Sequences of instrument fixations were also examined. The 
[.tree! tige occurrence of the subject's most used sequences 
decreased with increased task difficulty for novice subjects but 
not for highly skilled subjects. 

Analysis of the periodicity of the subject's instrument scon 
vi f a c c c r. [ 1 ill t f’ 1 1 s i ; g autocorrelation. Skilled pilots were 
found, when stressed, to scan their primary instrument in a 
periodic fashion. The period was related to the interval between 
number task presentation. A similar result was not observed in 
novice pilots. This finding suggests that skilled pilots may- 
handle the additional loading task in a much more systematic 
fashion that do novice pilots. 

Entropy rate (bits/sec) of t h e sequence of fixations was 
also used to quantify the scan pattern. It consistently- 
decreased for most subjects over the four loading levels used. An 
exponential equation in tasl difficulty was found to be a good 
predictor of entropy rate. When solved for task difficulty, the 
equation provided on estimate of the level of task difficulty 
perceived by a subject. This estimate was used to quantify the 
workload of the subject. 

Piloting and number task performance measures were recorded 
and a combined performance measure was computed. This was used in 
developing a model relating performance, skill, and mental 
workload. Entropy rate of the scan was used to quantify the 
workload and skill was estimated independently via a method based 
on pilot experience. The resulting exponential n.odel fit the data 
well enough to suggest that this t[p roach has promise in tie 
evaluation of interactions among these variables. 

The above results suggest the possible utility of instrument 
scan in the quantification of mental workload and/or pilot skill 
during c>nstant piloting tasks. Methods were also suggested for 
studying variations i r. pilot workload during short epochs, though 
these have not been attempted as yet. 
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This report summer is- es reseerch conductor' ti t lie 
relationship between the instrument seen of an aircraft pilot and 
the level of difficulty cf the several tasks of llying an 
airplane. The work originally concerned a specific question: the 
quantitative comparison of the mental workload of conventional 
cockpit displays vs. novel CRT displays such as the Cockpit 
Display cf Traffic I nf r rrr a t i 01 (Cm). However as the study 
progressed, it bee i me- clear that more fundamental work on the 
nature and quantification of the effects of mental workload on 
visual scanning behavior was necessary before such a comparison 
could be nude. Thus, the evolution of the reseerch has been 
away from the specific question first posed and toward developing 
a basic unde r s l r i d i i . g of visual scanning in pilots end of the 
i i if'! relationships between the instrument scan and piloting 
[iifcrnence, skill, and mental workload. 

This work has yielded an experimental paradigm for studying 
visual scanning behavior, several techniques for quantifying this 
behavici, and let suggested a number of possible avenues for 
further research. The techniques developed during the project 
have been upplied to several practical questions in aviation. 

Preliminary experiments using tht NASA Langley Terminal 
Configured Vehicle (TCV) simulator with CFT i ns t r ur. c r. t s and r 

Ticiowave Landing System (MLS) simulation served to help define 
the requirement of an experimental protocol to study instrument 
scan and pilot workload while also illustrating the problems in 
attempting to study complex man-machine interactions. 

The final set of experiments described here were conducted 
using a desktop general aviation simulator. The piloting task 
involved maintaining this simulator on a straight and level, 
constant sensitivity, Instrument Landing System (1LS) course with 
a low level of turbulence. A task employing an algorithm based 
on relative magnitudes of a sequence of numbers was used to 
increment the subject's mental workload. The task was presented 
at periodic intervals which caused the difficulty of the task to 
increase with increasing frequency of presentation. The level of 
loading for various conditions was also estimated in an 
independent series of runs using a side task. The subject's 
lookpoint on the instrument panel during each ten minute run was 
computed vie an oculometer and stored. A total of thirteen 
pilots of varying skill participated in two sets of experiments. 


Importance of Mental Workload 

The desire to measure workload is usually motivated by the 
need to predict situations in which operator performance will 
decline. The reasons for this are evident: if the operator has 
too many tasks to accomplish in too short a time, the performance 
on all or some of the tasks may be diminished. The same may be 
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true if the operator allows his attention to wane because the 
system he is controlling is highly automated. The latter is 
termed a condition of underload. 

Since a goal of workload measurement is the prediction of 
performance, it is often suggested that performance is tne 
parameter which should be measured as the loading conditions are 
varied. Certain performance criteria may be set and when the 
pilot cannot meet them the level of loading may be judged to be 
too high. Such a technique assumes that performance varies in a 
consistent fashion with loading and :kill. Thus, for this 
approach to be generally useful, all pilots should experience 
about the same performance decrement for the same increase in 
workload. Experience suggests that this is not the case however. 
In activities such as piloting (or playing a musical instrument 
or participating in an athletic event) where the simultaneous 
conduct of manual dexterity and verbal or mental tasks is 
especially important, performance of a skilled operator may not 
show any decrement (or may even improve) until leading is 
severe, and then a precipitous decline in performance may occur. 
Since the skill of commercial or test pilots is high, it is 
difficult to determine subtle differences in workload via 
performance decrement when they are used as subjects. One goal 
of this research is a non-invasive measure of workload which 
does not depend heavily on skill. Some aspects of visual 
scanning behavior may yield this result. 


Rationale for Studying the Instrument Scan 

If one hypothesizes that some repetitive piloting task will 
invoke a regular visual scan ( spat i al /temporal pattern of eye 
movements) during instrument flight then it may be possible to 
observe changes in this scan as external factors such as noise, 
interruptions or other side tasks, and fatigue interfere with 
the piloting task. If this hypothesis is correct, then 
alterations in the scar pattern used by the pilot may be an 
i n d i c 1 1 c r t f t i 1 1 c i fatigue c r J r. c r e a s e d / d e e r o a a e d r i t 1 1 1 
workload. 

The analysis of a subject's visual scan has been examined by 
various workers in an effort to study behavior. Numerous 
investigators have studied the patterns of eye movements during 
the viewing of scenes, pictures, etc. (Noton and Stark, 1971; 
Senders, 1970; Fisher, et.al., 1981). If a picture is being 
viewed, it is frequently observed that, after an initial period 
of general inspection of the scene, the scan tends to return 
frequently to the points of highest interest to the subject. 
Ambiguous figures such as the Necker cube (Ellis and Stark, 1978) 
have been used to determine whether the visual scan provides a 
clue on the nature of the perceived image. A common feature 
of these various experiments seems to be the allowance of 
free tje movements in viewing the target(s). Thus the scan 
pattern which develops is driven largely by the subject and not 
by the scene. 
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The repetitive scanning of a display in a man-machine system 
may become stereotyped if the scene/task appears frequently and 
requires a fixed level of performance on the part of the 
operator. For example, the task of flying an airplane using 
instruments for navigation reqn res skilled behavior, and 
dictates the presence of a relatively fixed scan pattern by the 
pilot (Weir end Klein, 1970; Waller and Flowers, 1977). Research 
on eye scanning of instruments in aircraft pilots dates from the 
work of Fitts and his associates (Jones, et.al., 1946). Indeed 
this work on probability of transitions between different 
instruments led to the regulations establishing the familiar 
"T" arrangement of the commonly used instruments in an aircraft 
cockp i t > 


AIRSPEED ATTITUDE ALTIMETER 

DIRECTIONAL GYRO 


Few other studies have been conducted on scanning behavior 
in pilots, probably owing to the complexity if Instrumentation 
which has been required to perform such studies H(t rr ti I ' . 
Sevtrfl studies has strongly suggested the utility of scanning 
behavior in assessing a Vfricty c>f human factors issues in the 
cockpit however. Dick ( 1980 ), for example has shown that there 
is a strong relationship between control inputs end visual scan 
strategy ir pilots, c't runs t rat i ng that there is typically a 
visual confirmation that a commanded input has achieved a desired 
change in one or more of the aircraft state variables. A recent 
study (Jones ,et .el 1 982) also suggests the utility of using 
scanning information as an adjunct to pilot training. Fcth cf 
these studies used the NASA/Langley oculometer to nccrc’ fjc 
scan. This device, based on the Honeywell oculometer, is 
suitable for conducting non-invasive scanning experiments in an 
aircraft cockpit (Spady, 1978). The work described here attempts 
to take advantage of this capability with an eye toward workload 
measurement techniques which may eventually be applicable during 
actual flight. 


A CONCEPTUAL FRAMEWORK FOR IRE STUDY 

71. i results from some early experiments provided some 
insight into several flaws in the experimental design and the 
lack of basic knowledge of scanning behavior in general. Among 
the more salient problems identified were: 

1. An unstated assumption of constant imposed mental loading 
throughout an experimental run vas invalid since the piloting 
task requirements varied considerable in different segments of 
the approach. This problem is not uncommon however and exists in 
most of the previous pilot scanning studies. The Instrument 
Landing System (ILS) approach is often chosen as the piloting 
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teik in studies of workload (Waller, 1976; Krebs and Wingert, 
1976; Spady, 1977). Hovever, the 1LS approach represents a 
constantly changing task difficulty as touchdown is approached 
(especially due to increases in Glide slope sensitivity and 
cost of error for course deviation). This variation in the 
primary task loading makes it difficult to accurately control the 
amount of montei workload on the pilot as on independent 
var table. 

2. There was insufficient data in any segment of the run to 
allow a reasonable statistical analysis of scan factors. Since 
it was not known which factors, if any, in the scan were 
important, it was essential to first determine if any "steady 
state" effects were present in the eye movement patterns. 

3. The U\t It of difficulty of the verbal loading task (see 
detailed description below) were not sufficient to induce large 
charges in tie scanning pattern. Thus, while some trends were 
noted in the scon os « result of the tcc'itiontl imposed task, 
these were ne t consistent and at no time were any of the subjects 
ever, close to being heavily loaded. 

4 . There was not a range cf pilot skill represented in the 
subjects; ell tv ere highly experienced and skilled NASA test 
pilots. It would seem very likely that Inexperience pilots might 
perform rather differently in these types of experiments. 

The above observations strongly suggested to the 
i i vest i g a t o r s that a more systematic, fundamental experiment 
night lead to more useful results. An inescapable conclusion rnay 
be drawn from these observations: Due to their 

interrelations! i [ s , vcikload, skill, and performance cannot t e 
divorced fro a one another but nut he studied together. The 
investigator must etlenft to explicitedly control or ft least 
It vt r u ant i t ! tat i ve knowledge of each of these parameters in 
order to make sense out of any one of them. 

As a guide toward experimental design and future data 
analysis, a conceptual model of pilot behavior was developed to 
aid in our thinking. It was felt that this model should include 
the following factors: 

lJ*erf3rmance - observed performance may be functionally 
related to all of the oth*.r factors; if the rpodel is to 
be useful, it should predict situations in which 
performance will decrement 

2. Pilot skill, including familiarity with the task(s) in a 

particular experiment. If he or she is unfamiliar with 
the task, learning may be expected during the course of 
an experiment 

3. Inherent difficulty in the task(s) which are performed; 

some flight maneuvers ore much rore complicated than 
others 

4. Nature and number of tasks which occur simultaneously 

with the primary task of flying the aircraft 


ORIGINAL PAGE g 
OF POOR QUALITY 


5. Psoyhological/physiological state of the pilot; probably 

quite important but not clear whether theae are part of 
the independent or dependent variable 

6 . Random Noise 

A hypothetical, graphical expression of there relationships 
is given in figure 1. Attempts at fitting a model UEing these 
parameters to the hypothet leal situation in figure 1 will be 
presented later in this discussion. 



Figure 1. Hypothetical Relationship between 
Performance, Skill, and Workload 


EXPERIMENTAL PROCEDURE 

V'iti these thoughts in mind, we set out 1o design a more 
straightforward series of experiments which would first consider 
whether it v.rs [otaihli ti derrt i s t t » consistent changes in 
the "steady state" scanning behavior during an instrument flight 
munucvcr cf c. c i. s t n n ' «. ifficulty in the presence of some 
controlled variation in mental difficulty of an additional task. 
If it could be shown that the steady state behevior could be 
altered, one might then proceed to determine the shortest epoch 
over which a reasonable estimate of the effect might be made. 

Three factors v/trc controlled ir the experiments: 1) a 
piloting task 2) a verbally presented mental loading task, and 3) 
a workload calibration side task. 

Piloting Task 

As a piloting task, we chose a simple, yet realistic, steady 
state instrument manuever which might be expected to occur for 
periods of up to 10 minutes in actual flight. This time 
period was chosen os an estimate of the minimum amount of time 

required to provide a sufficient number of fixations to 
satisfy the assumption of steady state conditions. The task 
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was to fly • precision straight and level course with zero 
degree glide si oft i id constant localizer sensitivity while 
maintaining a constant heading and airspeed in the presence of 
a low level of turbulence. A schematic representation of the 
task is presented in figure 2. 



Figure 2. Schematic of Precision Straight and Level Flight 


Pilot lookpoint on seven instrument* (Attitude Iridic a tor 
'ATT', Directional C-iro 'DG', Altimeter 'ALT', Vertical Speed 
Indicator * V S 1 * , Airspeed 'AS', Turn and Bank 'T*B', and Glide 
Slope/Localizer 'GSL') was measured using the Langley oculometer. 
The oculometer con measure the time course of eye fixations on 
instruments employed by the pilot and the dwell time of each 
fixation to the nearest 1/30 sec. 


The Mental Loading Task 

The mental loading task was chosen so as not to directly 
interfere with the visual scanning of the pilot (i.e. the task 
would not require the pilot to look away from the instruments) 
while providing constant loading during the maneuver. This was 
accomplished by having the pilot respond verbally to a series of 
evenly spaced three-number sequences (Wittenborn, 1943). Tie 
pilot was told that he must respond to each three-number 
sequence by saying either ''plus" or "minus" according to the 
algorithm : first number largest, second number smallest = "plus" 
(e.g. 5-2-4), last number largest, first number smallest = 
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Positive Number Sequences: 


Negative Number Sequences t 

Examples: 2-5-9 

8- 3-6 

9- 6-2 
3-7-4 


^ V* 

All Others 

+ 

+ 


Figure 3. Mental Loading Task Algorithm 


"plus" (e.g. 1-2-3), other wise, "minus" (e.g. 9-5-1). This 
algorithm is shewn graphically in Figure 3. The pilot was 
instructed to give the nui. her task priority equal to that of the 
piloting task os if the verbal questions represented a constant 
rate of radio communication. 

The mental wori.ioad experienced by the pilot was 
hypothesized to be inversely proportional to the time intervals 
between number sequences. This relationship is giver, by the 
following equation which is arbitrarily chosen: 

(1) TD = 1/interval between task 

where TD is equal to imposed task difficulty. 

In order to allow a wide range of 1 ceding, the task 
included intervals <f continuous silence (i.e. no numbers 
presented), ten, five, and two seconds which have corresponding 
Uhl difficulties of 0.0, 0.1, 0.2, and C.f-, respectively as 
calculated from equation (1). Calibration using the side task 
described below confirmed the relative difficulty of these number 
intervals . 

Numbers were generated by a computer controlled speech 
synthesizer (see hardware description below). This allowed 
automated scoring of tiri ceeuracy, calculation of response 
reaction times, and the possibility of temporal correlations of 
visual or other responses with the verbal stimulus. The 
1 i c i ( 1 i 1 i t i c i- of occurence of "+" and " - " sequences were each 
'.f. Performance tvt s icccrded by having the pilot press a 3- 
p os it ion '•ocker switch mounted on the yoke up for plus and down 
f c r minus. 


Visual Side Task for Workload Calibration 

The amount of mental loading imposed on the pilot by the 
number task was calibrated using a side task. The runs made with 
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the side task were not used in the scanning analysis, however, 
due to the alteration of normal scanning caused by the task. The 
side task employed a CRT which could display an asterisk 
appearing in the upper half or in the lower half of the screen. 
The display was mounted to the left of the simulator just outside 
the pilot's peripheral view. The asterisk appeared at random 
intervals between one and three seconds and remained on for one 
second (Ephrath, 1975). The pilot was told to turn the symbols 
off by using a three position rocker switch on the control grip. 
Moving the switch upward turned the upper asterisk off, downward 
turned the lower asterisk off. This task was done only when the 
pilot had time left from performing the primary tasks of flying 
the airplane and answering the number task. Thus the number of 
correct responses on the side task gave a measure of the residual 
capacity of the pilot from which a workload index could be 
calculated. The expression used to calculate the workload is 
given below. The constants were obtained using the best least 
squares fit weighting coefficients. 


( . 780 ) (RT) + ( .626 ) (MISS) 

(2) WLX x 100 percent 

( .780 + .626 ) (NSTIM) 

where s 

WLX - workload index 
RT * cumulative response time (seconds) 

MISS = number of incorrect responses 
NSTIM = total number of stimuli (symbols) presented 


Conduct of the Experiments 

Each session consisted of fenr IC-minute runs with a E- 
minute break between each run. The difficulty of the mental 
loading task v.ould start at no numbers for the first run and 
increase to 2-sec intervals by the fourth run. Some subjects 
participated in two sessions, one without and one with the side 
task. Each subject was allowed to practice all three tasks until 
he felt comfortable with them. Eleven subjects ranging in skill 
from. NASA test pilots to non-pilots participated in the 
exper iments . 


EQUIPMENT 

A desktop general aviation instrument flight simulator 
(Analog Training Computers ATC-510) was used to simulate the 
piloting task. The ATC-510 is e procedures trainer for light, 
single engine, fixed pitch prop, fixed gear, IFR equipped 
aircraft. The simulator was equipped with a turbulence level 
control which was set to the first level above calm 
conditions in trder to force sene pilot vigilance on the flight 
task . 
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The NASA/Langley Oculometer is described elsewhere 
(Middleton, et.al., 1977; Spady, 1978) and the interested reader 
Is referred to these documents. For the experiments described 
here, the oculometer provid ed a discrete voltage level 
corresponding to the current instrument fixation. This level was 
based on pilot lookpeint falling within predetermined X-Y 
boundaries about each instrument on the simulator panel. 

The simulator panel end oculometer optical head are shown in 
figure 4. 

A general purpose 8085 microprocessor development system 
(Burns, et.al., 1979) was used to control the verbal task and the 
workload calibration side task as well as to digitize, store, 
analyze, and display the scanning data from the experiments 
descr i b t d here. The system was equipped with 64 K of RAM, an 8085 
processor, two serial ports, an 8 channel/12 bit A/D converter, a 
CRT controller, f speech synthesis module, two double sided 
double density floppy disk drives with a Shugart 1 4 0 3 D 
intelligent controller module, and a dot matrix graphics printer. 
A photograph of this system is shown in figure 5. Software for 
the system was written in ETC 1C, an interactive programming 
language based on FORTH (Sachs, 1980) and in 8085 assembly 
language. Details of the programs muy be found in the thesis by 
Stephens (1981). 

Aircraft performance data was recorded during each of the 
experimental runs. The data recorded included : x-coordinate of 
lookpoint, y-coordinate of lookpoint, track/no track, pupil 
diameter, instrument identification number, glide slope indicator 
deflection, localizer indicator deflection, elevator deflection, 
aileron c. flection, pitch attitude, and roll attitude. These 
signals wort recorder! on a 14 -channel FM tape recorder, 
and digitized at NASA/Langley. Later the digital representations 
were transferred in flcpfy disks cn the microprocessor system. 
The RMS error and frequency content of the glide slope 
and localize i indicator do f ) on ' ' oi t wit used tc define, tie 
aircraft performance for each run (see later discussion). 


INDEPENDENT ESTIMATE OF PILOT SKILL 

In order to assess the effects of skill on performance 
and mental workload, an independent quantitative measure of 
skill was needed. A model of pilot skill based on tJfo lit nee- 
factors wa£ used for this purpose (Hollister, et el, 1973). 
This model was developed in order to predict the current level' of 
skill of pilots flying light, single engine aircraft. 

(3) Skill = 1.42 * P .2 5 ( r ecency ) + 0 .7 3 ( 1 og ( t o t a 1 time)) 

- 0.030(years cer t i f i ed) -* f .1 5 ( 1 og ( t i me intype)) 

- 0.0C88(age ) + e 


Figure 
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where 

Skill = score reflecting relstive piloting performance 

recency = number of flight hours in past 30 days 

total time - total number of flight hours 

time in type&total number of hours in light single engine 
aircraft 

years certified = time in years since last certificate or 
rating 

age = subjects's age in years 

e = residual variance not explained by the model 

A raw skill score was calculated for each of the pilot 
subjects using the model. The pilot with the highest resulting 
skill score was then used to normalize all of the scores so that 
skill levels would range between 0 % and 100%. Eleven 
subjects ranging in skill from NASA test pilots to non-pilots 
participated in the experiments. The relative skill scores for 
the subjects are given in Table I. 


INASA 

Pilot Number | Skill Score(%) 
1 

T 

| 


3 

i 100. OC 

1 


4 

1 85.31 

1 


11 

1 76.64 

1 


13 

I 53.96 

1 


15 

1 38.81 

1 


6 

1 37.47 

1 


12 

I 33.23 

1 


14 

1 31.71 

1 


8 

j 22.74 

1 


7 

1 15.28 

1 


16 

1 12.83 

1 

1 


Table I. Relative Skill Scores of all Subjects 


Though care must be taken when applying an equation such as 
this in a different set of experi rental conditions, the 
overall rank ordering of the pilots by this method is probably 
accurate as it generally agreed with subjective rating of 
the pilot’s skills by experienced observers at the NASA/Langley 
Research Center. 


RESULTS 


Initial Data Analysis 

A set of preliminary experiments using this protocol and 
apparatus were conducted during the summer of 1980. Subjects 
with a wide range of skills, from non-pilots to NASA test pilots, 
part ic i pated . 


12 


ORIGINAL PAD! » 

Of POOR QUALITY 

Ten minute runs with the side task were performed with 
3 of the pilots. The workload index defined above were 
determined for each pilot for all loading levels (Table II/. 
The index increased monotonical 1 y for ell subjects with Increased 
rate of presentation of the number tusk. The average workload 
index varied from 80 percent for no mental loading task to 92 
percent at the 4 second interval and 96 percent at the 2 sec 
intervale. Although we were not able to evaluate the workload 
index with ell pilots, the results with these three pilots did 
allow us to confirm quantitatively that the mental loading is 
increased os the interval between number presentations 
decreases . 


I P i lot Number | 

No Loading | 4 

-sec Intervals 

I 2-sec Intervals I 

1 9 | 

87 1 

9 3 

1 95 | 

1 5 I 

82 I 

94 

1 97 1 

1 7 I 

70 I 

89 

1 -- | 

I Average | 

80 | 
l„ 

92 

1 96 I 


Table 11. Workload Sidetask Results 


Dwell Time Histograms 

The raw scanpath data is of the form lookpoint vs. time, An 
example of the raw data is shown in figure 6. From this data 
dwell time hisiograms may be plotted for each instrument in the 
scanpath. Lx or, pit* of the results from several of these 
experiments are shown in Figure7. 

In the four novice subjects, the dwell time on the 

primary instrument (the Attitude Indicator in all but the 

non-pilot who used Clide Slope/ Localizer) became progressively 
weighted toward extremely long dwells as the verbal task 
difficulty increased. Figure 7 shows the dwell time histograms 
for all pilots on the Attitude Indicator, Directional 

Gyro, Glide Slope/ Localizer and Vertical Speed Indicator. 
First consider the plots for subject # 5 whe has inter mediate 
skills. Note that for the no loading case, the dwell histogram 
on the Attitude Indicator of subjects #5, rS and £1P has a 
fairly standard shape (Harris end Christhilf, If- 80). When 
numbers are added to the piloting task, the dwell becomes longer 
and the mode of the histogram at 1/2 second begins to 
disappear. The effect is even more dramatic for 2-second 
interval case; the entire distribution is skewed toward 

extremely long dwells on Attitude as the pilot apparently 
begins to 'stare" more and more at this instrument. Similar 
effects are seen for pilots 9 and 10. 

An interesting difference occurs for subject # 7, the 
non-pilot, however. This subject had no previous piloting 
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experience and was only (Ivan enough praetlea to allow him to 
stay nominally on coursa during the praelalon straight end 
laval manauvar. Nota that this subject adopted the Glide 
Slopa/Looal iaer as the primary instrument apparently in an 
effort to accomplish the precision task by keeping the needles 
centered. Even though the subject adopts the inappropriate 
instrument to accomplish the piloting task, the dwells on this 
instrument are affected in a manner similar to those on Attitude 
for the more experienced subjects. 

The visual scanning behavior of the two subjects with higher 
levels of skill was also effected by the verbal loading 
(subjects 4 & 11 in Figure 7). However, the effect was much les3 
than seen in the novice pilots. Figure 7 also shows the 
dwell time histograms for the NASA test pilot, subject #4. Note 
that he develops a slight stare on tic Attitude Indicator for 
the highest loading condition but lis histograms arc 
otherwise unaffected. Subject #11, who had tho next highest 
skill level, was somewhat more affected, especially at the 
highest loading level, as indicated by the histograms for 
the Attitude Indicator (Figure 7). Subject #11 uses a large 
number of short dwells on the Attitude Indicator under the no 
loading case. V’hen the mental loading task is introduced at 4* 
second intervals, his distribution is shifted to somewhat 
longer dwells. However, there is still a very significant 
peak at around 1/2 second. The actual shift in dwell times is 
not as large as that seen in the novice pilot's htrtograms, 
even though there appears to be a large change due to the 
reduction in magnitude of the histogram peak. 

The shift to longer dwells may also be demonstrated by 
looking at the percentage change from the no loading case in the 
number of dwells on the primary instrument that are 5 seconds 
or longer in duration as the mental workload is changed. The raw 
counts of such dwells are shewn as the lest element in the 
histograms. Table III shows the percentage change from the no 
loading case for each pilot. The percentage of dwells is seen 
to increase with decreasing skill level. This holds for 
all subjects except subject #7, the non-pilot. It should be 
pointed out, however, that subject #7 used a different primary 
instrument from the rest of the pilots end therefore had a 
completely different basic scan pattern from the other pilots. 
This fact may not allow direct comparison of the results from 
subject #7 with the other subjects. This is not a cause for 
concern since the results from all of the pilot subjects seem to 
be consistent and, therefore., anj conclusions drawn from their 
results should be applicable to other pilots. 

The dwell time char nc t e r ! s t * «•{. on itccndtij instruments 
were most affected in the novice subjects. The secondary 
instrument dwells ere seen to change in a different manner than 
the primary instrument dwells. As opposed to the shift to 
longer dwells, as in the case for the primary instruments, the 
effect of loading in the secondary instruments is to decrease 
the number of looks at that instrument, perhaps an example of a 
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phenomenon known as load shedding. Tha ahapa of soma of the 
histograms change* under verying loading conditions. 
Subject #4 was the only subject whose dwell time histograms on 
secondary instruments were not affected by loading. Subject 
#11 appears to exhibit some load shedding, primarily on the 
Altimeter and Vertioal Speed Indicator. 


IPilot 

l_ 

Number 

I No Loading 1 

l-stc Intervals 

2-sec Intervals 

1 

4 

1 0 1 

0.6 

8.7 

1 

11 

1 0 1 

1.05 

7.88 

1 

0 

1 0 1 

6.80 

8.46 

1 

5 

1 0 1 

6.50 

20.08 

1 

10 

1 o 1 

10.80 

23.30 

1 

1 

7 

1 0 1 

6.00 

IS .21 


Table 111. Percent of Primary Instrument Dwells Greater Than 
5 Seconds 


Fixation Sequences 

It W£. s of interest to examine whether pilots develop a 
scan pattern or patterns during the constant flying 

maneuver in this experimental paradigm. If the dwell times 
on individual instruments are ignored, an ordered list of 
instrument fixations may be developed for each pilot for the 
various loading cases. These lists may be broken up into smaller 
segments (or sequences) of verious lengths for easier 
analysis. Each different sequence may be considered as a 
component of the overall scan pattern. One may 
hypothesize that those sequences which occur most frequently 
during the maneuver are those of most importance to the pilot and 
ones which might indicate an ordered scan pattern. 

Examination of the results indicated that sequences 
of four-instrument fixations were the longest for which there 
was a significant amount of repetition during a run, hence 
sequences of length four were chosen for analysis. The number of 
times each four- i nstrument sequence occurred during a ten 
minute run was obtained as was the total number of sequences of 
length four in the run. From these data, the percentage of 
occurreiitu tvr i calculated for etch observed sequence. For 

cxmple there might be 800 sequences of length four in 10 
ninutes. If the sequence, ATT-DG-ALT-DG, occurs 40 times 
during the run, its percentage of occurrence would be 40/800 
X 100 percent = 5 percent. In this fashion, the percentage 
of occurrence of all length-four sequences in the no-loading 
case was determined for each pilot. The 10 sequences which 
occurred most frequently for each pilot were arbitrarily 
chosen as indicators of the scan patterns normally used by the 
various pilots. In general, the specific sequences were 
different for each g i lot. The manner in which the percentage 
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occurrences for thoto 10 ••qutnooi change for oaeh subject •• 
• function of mental loading Is shown in figures I. Flgura • 
plots the sum of thesa percentages aeross loading for all the 
subjects. It is important to note that the sequences used as 
the basis for calculation for ill conditions are the JC most 
frequent for the no-loading ease. rich line beginning at 
the no loading esse end ending at the 2-sec Interval case 
represeats the same sequence. 

Several Interesting observations may be made by comparing 
the plots of the skilled pilots (figure So and f) with those of 
the novice subjects (figures fs-d). A difference may be seen 
between the two groups In the percentage of occurrence of the 
most often used sequences. The first 10 sequences used by the 
skilled pilots comprise over 50 percent of their scan pattern 
(see sue. in figure I). The usage of these 10 sequences is 
relatively constant with changes in loading suggesting that the 
patterns are not disturbed by the verbal number task. The 
novice pilots' results differ in several respects from those of 
the skilled subjects however. The 10 most frequently used 
sequences in the no loading run occupy much .'mailer percentages 
of the total scan than do those of the skilled pilots. This 
suggests the novices' scans are mere random than those of the 
skilled subjects, even without the Imposition of an 
additional task. 

The novice subjects also show a consistent decrease 
ir tli i i re ci.-t age occurrence of the 'C sequences as the 
workload is increased. This decrease may be the result of 
either the equalization of the nunber of occurrences of each 
sequence in the run (i.e. a trend to randomization) or a change 
to a different set of sequences from those used in the no loading 
case . 


These finding* both strongly supported the possible utility 
of the instrument scan cs an indicator of toll* workload and 
skill. However, neither method seemed to allow direct comparison 
between scan paths for different types of meneuvers since 
instrument usage might vary considerably for different tasks. 
It thus appeared important to develop r more general analysis 
method. 


Quantifying Disorder In the Seanpath 

Traditionally, much of tne quantitative analysis of scanning 
patterns has employed Markov transition (robtbiliiy matrices 
(Stark and Ellis, 1981; Krebs and Klngert, 1976). Such 
matrices do describe the predominant patterns in the scan via the 
relative slsti of tiinsiticr. [ rebi.bi 1 1 1 1 es but it is eithet 
extremely unwieldy or impossible to compare two of these 
matrices fer different experimental conditions. One of the 
major goals of this research Is the i dent i f i cat ! oncfi gtiijil 
method for the study of scanning behavior. To be most usefil *!• 
method should he independent of the number and arnrgnait <f 
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LOADING TASK 


Figure 9. Percent Occurrence of Sequence vs Loading Task 
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instruments. The nature of eye-point-of-regerd data (sequential 
Instrument and dwell times) obtained from the ooulometer suggests 
several methods from information theory which may have this 
generality. 

The piloting task In the current experiment is such that the 
pilot's scan can only lie either on one of the seven specified 
instruments or on outside the oeulometer's range. Each fixation 
may be of arbitrary duration. The time history of fixations has 
a form which is similar to that of a communication system which 
can assume eight discrete states with a varying duration in each 
state (see figure 6). The orderliness of such a system Is related 
to the probabilities with which it occupies its different states. 
A system which always occupied the same state or always made 
tne same transitions between states would thus be quite orderly. 
In the case of instrument scan, these situations would be 
paralleled by staring and by a stereotyped scanpath respectively. 

This concept of system order may be stated compactly using 
the mathematical form for entropy from information theory. The 
entropy of a sequence is defined as (Shannon and Weaver, 1949)* 

D 

(4) H 0 * - Tip j log 2 Pi 3 

i *1 

where H 0 = observed average entropy 

p j * probability of sequence i occurring 
D = Number of different sequences in the scan 

In the case of the instrument scan, entropy has the units of 
bi ts/ssquence and provides a measure of the randomness (or 
orderliness) of the scanpath. The higher the entropy, the more 
disorder is present in the scan. The maximum possible 
entropy is constrained Ly ihi experimental conditions (set 
below). The entropy measure uses the same probabilities whicl 
ore present in transition matrices, but it yields a single, more 
compact expression for the overall behavior of the probabilities 
rather than presenting then tech individually. This method 
appears to afford some generality and has been the focus of our 
recent efforts. 

Note: The term Entropy has been associated with 
Information Theory for so long that its usage tends to 
suggest on attempt to quantify the information content 
of some system. However, older usage of the term comes 
from thermodynamics where entropy is used to describe 
the amount of disorder present in a system. In the 
present discussion it must be emphasized that there is 
no attempt to quantify the amount of information which 
the pilot is acquiring from his or her displays. Rather 
the mathematical form for entropy is used to compactly 
describe the amount cf spatial and/or temporal order 
present in the pilot's scanpath, in keeping with the 
meaning of entropy in thermodynamics. 
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In order to calculate 1 1» e entropy of the scan, each of the 
instruments to be examined was given a number. As the pilot 
scanned the instrument panel a sequence of these numbers wee 
then stored together with the dwell time for eeeh fixation. 
While sequences of up to length 4 were considered in preliminary 
analyses, the most detailed study was made on sequences of length 
2 since these seemed to yield the most consistent results. The 
remainder of the discussion here applies to the results 
for length 2 sequences. Details of the methodology are given 
elsewhere (Stephens, 1981). 

Notes Forshort observation times, it can be shown that 
the observed entropy for the instrument scan is 
related to the total number of fiMtion sequences (L, 
defined with equation 4 below) which oecured during a 
run. In order to compere entropies from the scans of 
different pilots- f c r different run lengths, each 
estimate of entropy had to be corrected for l and 
normalized to its maximum possible value, H max . H max 
may be calculated as follows. In the most general case, 

M instruments may be arranged in some arbitrary fashion 
on the cockpit panel. For a given number of instruments, 

M , and sequence length N , the maximum number of 
different fixation sequences is given byt 

(5) Q = M . (M-D**' 1 

= maximum number of sequences of length N 
or 

The number of bits required to uniquely encode all 
Q possible sequences is logjQ. It represents H max of the 
visual scan for the number of instruments and 
sequence length being considered. For example, with 8 
states (7 instruments + out of range) the value of Q for 
sequences of two instruments is 56 which yields a 
corresponding H max s 5.8. 

The normalized value of H may then be calculated fromt 
(6) Hcorr s H 0 • A / H max 


where A = Log 2 L for L<Q j = 1 otherwise 

L = R-N + 1 = number of sequences in a run 
R = number of fixations in a run 
N = sequence length (N = 1,2,3, or 4) 


A Revised Method for Calculating Entropy 

The method for calculation of entropy described above has a 
flaw which had to be corrected in order to insure proper 
calculation of frequency of occurence of different sequences. 
The method described afevt ignores the overlap between 
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successive sequences. For example, the sequence 123431431 is 
interpreted to include the length four sequences 1234 2343 3431 
4314 and 1431. Clearly, the frequency of sequences 
determined in this fashion will be correlated and in fact does 
not provide the appropriate estimate of probability of sequence 
occurence. Consider the sequence 12121212. For purposes of our 
analysis, it probably does not matter whether the sequence 
1212 or 2121 is considered to occur. Both relate essentially 
the same pattern when a long run such as this occurs. The 
pattern 12125342121 on the other hand shows these sequences to be 
different on the basis of context in the scan pattern. 

Recognizing this problem, we have adopted a new method of 
calculating the frequencies of the various sequences. An initial 
pa66 is made on the data using the original method to 
identify sequences. That sequence which occurs most 
frequently is noted, the number of occurences stored, and the 
occurences of this sequence ore then removed from the data run by 
inserting >1 instrument code in the relevant locations. A second 
pass is then made in which the most frequent valid sequence (the 
•1 codes are ignored) is identified and removed. This process 
continues until all independent sequences hove been identified 
and removed. This process insures that no sequence is counted 
twice in estimating the probabilities of occurence of different 
sequences . 


Entropy Rate 

While entropy should help to explain the orderliness (or 
lack thereof) of the scanning pattern, the development 
presented up to this point does not include the fact that the 
dwell time for eech fixation is different. From the 
preliminary results or instrument dwells, it appears rather clear 
that dwell times can he markedly effected during high mental 
loading. In order to include the effect of time in our measure, a 
term for entropy rate was defined as: 

(7) H rate = H c /t 

where H 0 is the entropy for the system given by equation 2 and t 
= smallest interval in which that transition occurs. 

In practice, H Pflte is an overage value given by the 
following: 

D 

(8) H rate * ^j^corr ^i ] 

where ( H c 0 r r ) | = Normalized entropy for ith sequence 

DT| = Average dwell time for ith sequence 
D = Number of different fixation sequent > 

The maximum value which H pa * e can assume may be calculated 
using the Q x determined iilovc together w’ith dwell time 
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statistics fcr the various instrument sequences in the seen. 
While it is possible for pilots to make rethcr rapid glances 
(with dwell times of 1 0 f msec or lees) at their instruments 
(Harris and Chrlsthilf, 1980) a fixation rate this high (10 
f i xat i ons /sec ) rapidly leads to oculomotor fatigue (Fnhil), 
1977). 1 . rare realistic average value is 'reliably about two 
fixet ions/sec or less for a long period of instrument scan (say > 
10 sec). 

Using this value ( c.5 sec/look) as the average dwell 
interval, the maximum entropy rate for sequences of length two is 
calculated from equation 5 to her 

^ H rateWx * 5. 8/0. 5 . 2 f ixet ions/seq. * 6 bits/sec 

This number represents an upper bound. Since we suspect that 
tic pilot must exhibit some regularity in his or her scan, the 
timbers we would expect to obtain under actual flight conditions 
v/i 1 1 probably be lower. The observed average I’ rete * op the 
current experiments was on the order of 1 bit/sec. A tendency to 
s. t o i t ii dir increased load should bo reflected by decreased 
ut ropy rid increased fi ration tines making H rate tend tcvtrd 
lower values under such conditions. Figure 10 plots K ro te vs 
number task difficulty for several pilots. 



Figure 10. Entropy Bate on Length-2 Sequences vs Imposed 
Task Difficulty for 8 Pilots ( Relative Skill 
Levels Shown on the right - highest=100* ) 
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A trond toward lower entropy rate with higher task 
difficulty may be aeon. A two-way analysis of varianee was 
performed for the entropy rate data from nine pi lota on levels of 
task difficulty and between subjects. F-testa allowed rejection 
of two null hypotheses! equality of mean H pa « e at all loading 
levels (p < 0.0X) and equality of mean H. a . a between subjects 
(p < 0.01). All six combinations of level differences in meen 

significant (T-test 
from scanning behavior 


The model used expresses H pa | # as an exponential function of 

TD. 

(9) H rate « 0.9279 e _TD 

This equation was obtained via a regression analysis based 
on the data from seven of the pilots with a coefficient of 
determinat ion, R 2 « 97.3%. It is solved for task difficulty with 
the following results 

(10) TD * -[0.06 ♦ ln(H pate )] 

This expression can then be used to predict the level of 
task difficulty for e new subject under the conditions of the 
experiment reported here. 


Hrate were found to be statistically 
p < 0.06). Thus H pate was chosen to map 
Into task difficulty (i.e. workload). 


Autocorrelation and Power-Spectral Density 

Another analysis method ‘s tie aut ocorrelot ion of the 
instrument scan pattern. The purpose of this particular method 
of analysis is to determine whether or not the pilot's scan is 
altered by the mental loading number task in a periodic fashion. 
Cr.t {ciiillt alteration tlrt might be encountered is that the 
frequency at which an instrument is sampled may change as the 
auditory task changes. Specifically, the nature of tie 
relationship between instrument scan frequency and number task 
presentation frequency tcsk would provide valuable hints on how 
the task, and therefore the associated mental load, effects the 
scanning pattern. 

The autocorrelation was performed on the data as described 
below. Due to the arbitrary nature of the assignment of 
instrument numbers, the autoco/relat ion of the signal containing 
all instrument numbers would not necessarily produce meaningful 
results. For this reason each of the seven instruments vote 
examined successively by replacing the time sequence of all 
instruments with t sequence (xj(i)) where the value is 1 when 
instrument j is being fixated and 0 when any other instrument is 
being fixated. In order to eliminate the dc component for further 
spectrum analysis, a zero-mean sequence (fj(i)) was computed from 
(Xj(i)} as follows: * 

(11) f j ( i ) * x j ( i ) - Xj 
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where xj(l) ■ 1 if specified instrument j is being fixeted end 0 

otherwise 

Sj • mean of (xj(i )) 

The sample autocorrelation of (fj(i)) v or sample 
autocovarianee of {xj(i)), was calculated by the formulas 

(12) Rj(k) - 1/n lit j(i) . f j(i+ k)J 

i*l 

where Rj(k) * autocorrelation sequence for instrument j 

n * number of samples * total run durat ion/oculometcr 
sampling period (l/30th sec) 

This autocorrelation was computed for each of the seven 
instruments for each loading case on each pilot. In order to 
detect possible periodicity in the scan, the Fourier transform of 
the autocorrelation was taken to produce the power density 
spectrum. From this a value for the dominant frequency may be 
obtained. 

The power-spectral density was obtained by using a Fast 
Fourier Transform (FFT) package available on the microprocessor 
system. Some interesting results emerged from this analysis the 
first of which may be seen in Figurell. This shows the 
autocorrelations for pilot #4 (second highest skill level) for 
Ms attitude indicator on each of the four different mental 
leading cases. A change in the dominant frequency may be seen as 
the loading is increased. The power-spectral densities shown in 
Figurel2 show the dominant frequencies for the low (10-second 
intervals), medium (S-second intervals), and high (2-second 
intervals) levels of mental workload to be 0.0928 Hz, 0.1709 Hz, 
and 0.3175 Hz respectively. These frequencies correspond to 
periods of 10.78 seconds for the low, 5.84 seconds for the 
medium, and 3.15 seconds for the high level of mental workload. 
These periods are closely related to the number tasks periods 
(11, 6, and 3 sec) given by the sum of the interval between 
number presentation and the time required to present the numbers. 
This implies, at least for this pilot, that the loading task 
directly influences the scan pattern. When no numbers ere 
presented, the pilot scans his instruments in a close-to-random 
manner and the density spectrum exhibits no dominant frequency 
(cf fig. 12. a). When the periodic task is applied, the scan 
becomes more and more periodic with increased task frequency (cf 
fig.l2.b&c). This demonstrates that the pilot has a tendency to 
multiplex tie flying task and the number tusk for greater 
efficiency. Overload occurs when numbers arc presented too 
rapidly for the pilot to efficiently multiplex both tasks (cf 
fig. 11. d). A similar behavior is observed for ell of the higher 
skilled pilots as demonstrated in Table IV. The periods of 
oscillation for the 5 pilots of highest skill appear to match 
those presented to them by the mental loading task very closely. 
However, the other 6 pilots do not seem to have any consistent 
pattern in their autocor relut i on of sequences. Most of the 
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Figure 11. Autocorrelation for Pilot #4 ( relative skill levels = 
85%) using Attitude Indicator ( Dotted Lines Indicate 
10-sec Intervals). Number Task Intervals and 
Associated Task Difficulties are a) No Intervals - 0, 
b) 10 sec - 0.1. c) 5 sec - 0.2, d) 2 sec - 0.5 





Figure 12. Power Spectral Densities for Pilot #4 (Relative Skill 
Level = 8551) Using Attitude Indicator (Dotted Lines 
correspond to Frequencies of 0.1, 0.2, and 0.5 Hz 

respectively). Number Task Intervals end Associated 
Task Difficulties are a) No Intervals - 0, b) 10 sec - 
0.1, c) 5 sec - 0.2, d) 2 sec - 0.5 
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Pilot's 

relative 

skill 

levels 


Number task perlods(see) 

I 

♦ 



4- 

1 

11 

4“ 

1 

8 

■ 4“ 
1 

3 

i 1 00 % 

V* 

1 

9.75 

1 

5.89 

1 

4.18 

j 85% 

* ■ 
1 

10.78 

1 

5.85 

T* 

1 

3.15 

j 77% 

1 

9.75 

1 

6.40 

1 

6.02 

i*53% 

1 

9.31 

T* 

1 

5.25 

1 

2.84 

j 39% 

1 

9.75 

* ▼* 
1 

6.40 

1 

2.93 

j 37% 

1 

10.24 

1 

5.25 

1 

34.13 

I 33% 

t* 

1 

2.03 

■ t* 
1 

7.59 

1 

12.80 

j 32% 

♦ ■ 
1 

5.25 

1 

5.69 

1 

6.61 

j 22% 

T* 

1 

9.31 

1 

12.80 

• T* 

1 

3.79 

j 15% 

1 

1.32 

1 

7.88 

1 

13.65 

I 13% 
♦ 

T“ 

1 

■4* 

17.07 

1 

* 4« 

20.48 

1 

■4* 

7.88 


Table IV. £cri 1 1 t c cor re! i* 1 1 on dominant periods for 9 pilots 
using attitude indicator (glide slope/ localizer ror •) 
for 3 frequencies of the mental loading task. 


pilots showed little or no periodicity in the no-loading case. 
One possible explanation of these results may be that the higher 
skilled pilots adapted their scanning to the task much faster 
and better than the lower skilled subjects. DeMaio, et al (197f) 
found that skilled pilots evidently developed optimum scanning 
strategies when presented novel tasks much faster than unskilled 
pilots. Another explanation may be that skilled pilots have a 
better developed ability to time multiplex several simultaneous 
tasks . 


Performance Measures 

Before discussing the modelling effort in this study, it is 
necessary to mention how task performance was estimated in these 
experiments. Several variables were obtained from each of the 
two tasks in order to allow the computation of performance 
scores. The scores developed ran between C percent end 100 
percent with 100 percent being obtained if the pilot nevt r 
deviated from the intended path in space on the piloting task, 
and if all number tusk sequences were answered correctly for 
the mental loading number task. The scores from the piloting 
and the mental loading tasks were then combined to provide a 
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performance measure to be used In the validation of proposed 
per formanee/ski 11 /workload model. 

The scoring measure for the number task was computed as 
given below. 


(IS) 


TP ■ 


(TOT - WHO - 
TOT 


MIS) 


x 100* 


where 

TP "mental loading number task performance 

TOT* total nun ter of stimuli presented 

WHO * number of incorrect responses 

MIS * number of missed responses 

This score was 100 percent if the pilot answered every sequence 
correctly and zero percent if a pilot either answred 
incorrectly or risicd ell of the stimuli presented. Most 
subjects score nearly 100* on this task if they have nothing 
else to do simultaneously. 

The raw data available for scoring performance on the 
piloting task were the errors from the intended track for the 
glide slope end localizer courses. Discussions with several 
highly skilled pilots revealed that accuracy of tracking 
the glide slope end localizer might not provide a complete 
performance picture. These pilots were willing to trade 
off "smoothness” when the loading task became more difficult; 
i.e. the pilot may perform the piloting task to the seme level 
of accuracy, as far as deviations from a designated path are 
concerned, on two different runs but produce two very different 
ride qualities for tlese runs. Cne possible measure for 
smoothness could be the frequency of oscillation around the 
intended path. The higher this frequency is, the less "smooth" 
the ride becomes. It was arbitrarily assumed that a smooth 
ride would contain frequeoies mostly less than 0.1 Hz. Under 
this assumption, measurement of the spectral component of 
the aircraft dynamics above 0.1 Hz. would indicate any 
decrement in the ride quality. 

In order to examine this measure, the power-spectral density 
(PSD) of the course deviations was computed. The bandwidth 
of the calculated PSD was 2.5 Hz. The "power" within a band of 
frequencies may be determined by integrating the PSD over that 
band (Schwartz, 1959). We chose to consider the r of the 
spectral power which was located in the bend from 0.1 to 2.5 
Fz. This was calculated by subtracting the power contained 
in the band from 0 to 0.1 Fz (assuming that the D.C. component 
was first removed) from the total power in the spectrum and 
multiplying by 100*. This * of the PSD was computed for both the 
glide slope and the locelizer and combined wth the two RMS 
measures to provide four candidate variables to be included in a 
performance score for the piloting task. 
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Sine* the pilots were instructed to give equal 
priority to the piloting task and the mental loading number 
‘task, both were included in the development of a combined 
performance score. While a weighting of 0.5 might have 
been assigned to each task, It was decided to leave the 
weighting free to allow the model fitting procedure to 
determine the relative weights. A linear relationship 
between all of the terms was assumed and the form of the 
equation becamei 

(14) P > CONST 4 a( TP) 4 b(RMS/GS) 4 e(RM8/LOC) 

4 d(*PWR/GS) 4 eUPWR/LOC) 

where 

P * combined performance measure 
CONST * constant term 

TP * mental loading number task performance 
RMS/GS ■ RMS error from glide slope track 
RMS/LOC * RMS error from localizer track 
KPWR/GS ■ percent of power from the power-spectral density 
forthe glide slope greater then(M Hertz 
fcPWR/LOC * percent of power from the power-spectral density 
for the localizer greater than 0.1 Hertz 


A Model Relating Workload, Perforsunee, and Skill 

One of the major goals of this work was the development of 
a model relating performance, skill, and mental workload. The 
ultimate goal is the prediction of performance given 
estimates for the other two parameters. A model relating 
these three parameters may be postulated from the empirical 
relationship shown in figure 1. Construction of the 
model should, in fact, aid in determing whether such empirical 
expressions are valid. The model chosen was an exponential forms 


(15) 


P « P(0) - EXP((TD/Skill)2 ) 


Th i s e c 4 u a t i c r.rri y h ( r e n r r a n g e d a s f o 1 1 o w s s 


(16) EXP((TD/Skill)2 ) « P(0) - P 


which states that the exponential term is equal to the 
difference in te performance at the no-loading level P(0) and 
the performance at the present level of mental loading P. Using 
the values for the level of skill and task difficulty 
calculated in equations 4 and 11 respectively, the left hand 
side of the equation may be computed. The right hand side of 
the equation must be expressed in terms of measurable performance 
indicators. Making use of equation (14), the right hand side of 
(16) may be expanded to yields 


• » 
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(17) P(0) - P ■ •( ITP(O) - ITP) 

♦ b(RM8/G8(0) - RM8/Q8) 

4 o(RM8/LOC(0) - RMS/LOC) 

4 d(*PWR/OS(0)- APWR/GS) 

4 e(*PWR/LOC(0) - APWR/LOC) 

A multiple regression analysis wet then performed the 
expanded version of equation 16 using values for eaoh of the 
indicate parameters recorded during the experiments. The data 
from seven pilots was used for model development! while that 
from three other subjeots was used for model verification. 

The results of the first attempt at regression indicated 
that the coefficient of the &PWR/LOC term could not be 
differentiated from zero based on a Student's T-test. This 
variable wes eliminated from equation 17 and the analysis was 
repeated. This regression yielded non-zero values for the 
coefficients a through d, end Included a constant term. The 
resulting equation wast 

(18) EXP((TD/Skill)2 ) - 1.4483 

4 0 .0 3 5 1 (#TP( 0 ) - #TP) 

4 0.1 765(RMS/GS( 0 ) - RMS/GS) 

- 0.0366 (RMS/LOC( 0 )- RMS/LOC) 

4 0.0377(*PWR/GS(0) - *PWR/GS) 

This analysis had an R squared value of 76.6 percent and an 
F-ratio of 12.28 (p < 0.01). The coefficients determined for 

16 may now be used in equation 14 which becomes 

(19) P -1.4483 4 0 . 0 3 5 1 ( #TP ) 4 0.1 765(RMS/GS) 

- 0.0366 (RMS/LOC) 4 0 .0 3 7 7< fcPWR/GS ) . 

These coefficients provide the relative weightings for 
each of thf performance terms but they need to be scaled in 
order to provide the proper characteristics for the equation. If 
each of the terms were at their maximum value, that is 
100 percent, then the combined performance measure should also 
equal 100 percent. However, using the coefficient this ICO 
percent, each coefficient must be multiplied by 100./22.72 - 
4.40. The modified performance equation becomesi 

(20) P - 6.3750 4 0.1 S 4 5 ( TP) 4 0.7769(RMS/GS) - 0.1611(RMS/LOC) 

4 0.1659UPWR/GS) 

A plot of this fuction versus the task difficulty., obtained from 
equation 10, is provided in Figure 13. It was hoped that 
these curves would resemble those given in the 

hypothetical plot in Figure 1 and for some of the p-ilots, a 
general overall downward trend is present. Even though the 
curves do not match the hypothetical ones exactly, there 
are some common features between them. First of all, the curve 
for the lowest skilled pilot 7 is seen to decrease much more 
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Figure 13. Combined Performer From Model) vs perceived task 
Task Difficulty f*r 7 Pilots Used in Model Development 
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rapidly than the curvet for the more highly skilled pilots ( 3, 
111 the two points for 3 are for the third and highest levels 
of mental loading respectively). 

To test this model's value as a predictive tool, the data 
from three subjects not included in the model determination, 
were substituted into equation 17 and plotted versus perceived 
task difficulty in Figure 14. Pilots 12, 8, and 16 produce 

some interesting, if not consistent results. The three 
points of pilot 12, and pilot 16 are for the second, third, and 
highest loading levels. All three pilots show a net decrease in 
performance between their lowest and highest task difficulties 
even though they accomplished this decrease in very different 
ways. Pilot 8 appears to be the closest to the 
theoretical model with his sharp decrease in performance over a 
very small task difficulty increase. Pilot 16, on the other 
hand, appears to be decreasing at an exponentially decreasing 
rate as opposed to the model which predicts reusing 
performance at on exponent i a 1 1 y incrc-esiig rate. Pilot 

12 increases per f ermance sharply between his second and third 
runs and then decreases just as sharply between the third 
and fourth runs. 

Since the choice of the exponential model for 
performance/skill/workload was arbitrary, two other forms for 
the model were also examined. These were circular and linear 
models and neither was as good at fitting the data as the 
exponential and hence were abandoned. The models described here 
are still under development and work is in progress to repeat 
the experiments described here and to apply this methodology 
to other instrument flight scenarios. 


CONCLUDING REMARKS 

Our results suggest that in a skilled task such as 
piloting where instrument scan plays an important role, the 
scanning behavior may serve as an indicator of both workload and 
skill. The results presented do not, at this time, seem to 
support the notion of an accurate, absolute measure of workload. 
However, a quantitative, relative comparison of mental workload 
under varied conditions does appear to be feasible. 

One implication of the effort applies to the estimation of 
workload of some new procedure which may have several possible 
levels. In many cases, test pilots with superior flying 
skills are utilized in the estimation or measurement of 
workload. This often leads to equivocal conclusions in 
comparing alternative procedures or displays. The present 
findings indicate that different levels of loading may be 
difficult to measure in skilled subjects since they appear 
to be less sensitive to increased difficulty (see figures 1, 9, & 
11). Our results imply that pilots of moderate skill are 
more sensitive to the verbal loading task. Thus if one is 
concerned with the question of the effect of changing the level 
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of difficulty of fome task, than aa one atep in tha evaluation, 
the uae of pilota of Intermediate akill at aeveral loading 
levela would aeem appropriate alnee their behavior (vlaual 
scanning and performance) will be altered more aa a function 
of the loading task then will that of more skilled pilota. 

Another possible application may be the assessment of pilot 
skills. The work presened here suggests that there is a 
relationship between the scanning behavior of the pilot and his 
skill level. The obvious place one might use this result is 
in training. One may hypothesize that, as a pilot's skills 
develop, his visual scanning behavior will be less and less 
affected by non-visual increments in workload. This hypothesis 
is supported by a number of our findings. It appears that as 
shill increases, the percentage of long dwells decreases for a 
particular loading level. The scan pattern used during a 
fixed maneuver is also unaffected by verbal loading at higher 
skill levels, a result supported by both the frequency of usage 
of different instrument fixation sequences and by correlation 
methods. This finding might be utilised in assessing pilots' 
currency, competence, and level of skill) the technique might 
be used to pinpoint areas which may require additional training 
or practice. 
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